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ABSTRACT 


This  thesis  studies  the  Behrens-Fisher  test  for 
the  Behrens-Fisher  problem.  Fisher's  hypotheses  and 
his  proposed  methods  of  verification  of  the  test  are 
discussed.  A  sampling  study  on  the  actual  size  of  the 
test  based  on  Fisher's  procedures  is  conducted.  Actual 
sizes  for  different  parameter  values  are  obtained  and 
tabulated.  These  results  are  discussed  along  with  those 
obtained  by  other  investigators.  It  is  shown  that  the 
actual  sizes  as  calculated  using  Fisher's  proposed 
methods  are  close  to  the  nominal  sizes  of  the  test. 
Furthermore,  it  is  seen  that  the  actual  sizes  for 
larger  degrees  of  freedom  agree  more  closely  with  the 
nominal  sizes  than  those  for  smaller  degrees  of  freedom. 
The  test  is  thus  recommended  for  use  because  it  actually 
yields  actual  size  close  to  the  size  specified  by  the 
user. 
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CHAPTER  I.  INTRODUCTION 


I .  Statement  of  the  Problem 

Suppose  samples  of  sizes  +  1,  +  1  are  taken 

separately  from  two  normal  populations  with  true  means  y^, 

2  2 

y2  and  true  variances  and  .  Let  the  means  of  the 

-  -  2 

two  samples  be  and  x 2  and  their  variances  be  (n^  +  1)S-^ 

2 

and  (n2  +  1)S2  .  On  the  basis  of  these  samples,  we  want 
to  test  the  hypotheses  that  there  is  no  difference  between 
the  means  of  the  two  populations.  That  is,  we  want  to  test 
the  hypothesis,  Hq  :  y^  =  ^2* 

Two  cases  may  occur: 

(i)  and  a2  are  known  or  equal  or  in  a  known 
ratio . 

(ii)  and  a2  are  not  known  or  not  in  a  known 
ratio. 

For  case  (i) ,  the  problem  of  the  test  of  hypothesis 
is  easily  solved.  There  are  three  situations: 

(a)  an  and  a9  known. 

2  „  2  * 

Now  xn  ~  N(yn,  1  )  ,  x0  ~  N(y0,  2 _ ). 

1 

Let  d  =  x1  -  x2«  It  is  easily  shown  that: 

E (d)  =  yx  -  y2 


* 

where  x  ~ 
with  mean  y 


a. 


var(d)  = 


n^  +  1 


n2  +  1 


N(y,a  )  is  read  "x  has  the  normal 


and  variance  a  " . 


distribution 


1 


2 


If  and  a 2  are  known,  the  statistic 


v  = 


~  N(0,1), 


and  hence  the  test  of  hypothesis  is  easily  carried  out 

using  the  standard  normal  table. 

2  2  2  2 
(b)  =  °2  ~  0  kut  va^ue  °  is  not 

known.  Use  the  t  test  to  test  the  significance  of  the 

difference  between  the  two  means  as  follows: 


Let  S 


n^  +  1 


-  v  2 


^  (xli  -  Xl> 


i  =  1 


n1  (n^  +  1) 


n2  +  1 


-  v  2 


(x2i  -  X2> 


i  =  1 


n2  (n2  +  1) 


We  know  that 


nl(nl  +  i)s1 


o 


n2 ^n2  +  S2 


a 


are  independently 


2  2 

distributed  as  x  and  X  respectively.  If 

n^df  n2df 

2  2  2 
~  °2  ~  ° 

nn  (nn  +  1)S,2  +  n0 (n9  +  1)S  2  ? 

— — - - = - - - - - —  is  distributed  as  x  with 

a 

(n.  +  n0)  degrees  of  freedom. 


Now 


d  -  (ul  "  »2> 


(nl  +  1  +  n2  +  L) 


1/2 


N(0,1) 


Let 


t  = 


d  -  (yx  -  v2) 


1  1  +  i— U/2 

0  Vnl  +  1  n2  +  1j 


[n^n^  +  DS^2  +  n2(n2  +  1)S22]'1//2 

- t~ : - : - 

'  a  (nx  +  n2) 


(ni 


2  . 


(1.1) 

t  which  does  not  now  involve  the  unknown  o“  is  then  dis¬ 
tributed  with  Student's  t  distribution  with  n^  +  n^  degrees 
of  freedom. 

2  2 

(c)  If  and  a ^  are  in  a  known  ratio,  say  0  = 

2  2 

al  //a2  -*-s  ^nown  •  t^le  t  statistic  can  still  be  used  to 

perform  the  test.  This  is  shown  as  follows  : 

2  2 

By  analogy  with  (1.1) ,  if  a,  ^  a0  ,  we  have 


t  = 


d  -  +  u 2 


al  +  °2 


1  2 

nl(nl  +  1)S12  n2(n2  +  1)S2 

2  1/2  ] - — 2 -  +  - 7  2 


-1/2 

2 


ni  +  1  n2  +  1 


nl  +  n2 


(1.2) 


is  distributed  as  t 


(n^  +  n2)df 


Define  u  =  S^/S^ ,  N  =  (nx  +  l)/(n2  +  1)  and  with 
2  2 

0  =  /a 2  /  we  can  rewrite  (1.2)  as 


-  -  .  — 


4 


(d  -  y1  +  p2)  (nL  +  n2) 


1/2  ,2 

ial  +  °2 


-  1/2 


t  = 


nl  +  1  n2  +  1 


1  o 

1/2 

2 

n2  +  1 

r2 — i 

|  2  ' 

1  n2  +  1 

nl(nl  +  lJSj^2  n2  (n2  +  Ds22 
■2 -  +  - 2 - 


1/2.  _  ,  i x  c  2,1/2 

\  /  n2  ^n2  ^  1)^2 


n2  (n2  +  1^S2 


=  (d  -  ux  +  y 2 )  (n;L  +  n2)  1/2 


„  2 

0  2 

!/2  1/2 

h  +  11 

/  2 

n2 (n2  +  1) S2 

1/2 

n.  .  Nu 

1  i  J- 

1/2 

n2  +  ! 

r  n/ 

Q 

tO 

to 

1  nn  0 

4 

=  (d  -  v1  +  u2)  (r^  +  n2)1/2 

n1/2S2(l  +  i)1/2  (l  +  ^l.  Nu)1/2 

n2  e 


Hence  given  n±,  n2 ,  S2  and  6  ,  the  t  statistic 

can  be  used  to  test  the  difference  between  means. 

2  2 

For  Case  (ii) ,  and  a 2  are  not  known  or  not  in 

a  known  ratio.  For  large  samples  the  t  statistic  above  is 
asymptotically  a  standard  normal  deviate.  However,  the  small 
sample  solution  of  t  involves  the  unknown  parameter  0.  Thus 
the  problem  of  the  test  of  hypothesis  that  the  two  means  are 
equal  when  the  true  variance  ratio  is  unknown  arises. 

This  is  the  so-called  Behrens-Fisher  problem.  In 
1929  Behrens  (1929)  proposed  an  original  solution  which  was 
later  extended  by  Fisher  (1935).  Since  then,  statisticians 
have  proposed  other  solutions  and  presented  discussions  on 


S\I  - 


these  solutions  (e.£.  Welch  (1938,  1947,  1956),  Scheffe 
(1943,  1970),  etc.).  The  controversy  still  remains  today, 
forty-five  years  after  Behrens  proposed  his  solution,  as 
to  which  solution  is  the  best  one  for  the  Behrens-Fisher 
problem. 

II .  Objectives  of  Research 

The  objectives  of  the  present  research  are  three, 
with  the  second  one  as  the  major  objective. 

1.  To  give  a  brief  survey  of  some  of  the  major 
solutions  to  the  Behrens-Fisher  problem. 

2.  To  examine  the  Behrens-Fisher  test  with 
emphasis  on  the  method  of  verification  of  the  test  in  Fishe 
(1961) ,  and 

(a)  Calculate  the  size  of  the  test  for 
selected  tabular  values  of  d. 

(b)  Investigate  the  power  of  the  test. 

3.  To  attempt  to  reach  some  conclusions  on  the 
range  of  n1  and  n2  in  which  the  Behrens-Fisher  test  is  best 


suited  for  use. 


CHAPTER  II.  SURVEY  OF  LITERATURE 


I .  The  Behrens-Fisher  Solution 

Behrens  (1929)  gave  the  solution  as  follows: 

Suppose  we  have  a  sample  of  n^  +  1  observations  from  a  normal 
population,  yielding  a  sample  mean  and  sample  variance 

(n,  +  1)S,2,  where  n^  +  1 

1  1  £ 

c  2  -  i  =  1 
bl  n1(n1  +  1) 

Then,  if  is  the  true  mean  of  the  population, 

M1  =  ^  +  S.^  (2.1)  , 

where  t^  is  distributed  in  the  Student’s  distribution  with 
n^  degrees  of  freedom. 

Similarly,  if  a  sample  of  n^  +  1  observations  is 
taken  from  a  second  normal  population,  then  for  the  second 
mean,  we  can  write  y2  =  x^+  S2t2  (2.2)  , 

where  t^  is  distributed  in  the  Student's  distribution  with 
n^  degrees  of  freedom  independently  of  t^  and  x2  ,  S2  are 
similarly  defined  as  in  the  first  population. 

Under  the  null  hypothesis  that  the  two  population 
means  are  equal,  we  have,  from  (2.1)  and  (2.2), 

d  =  \  -  x2  =  S2t2  -  S1t1  (2.3) . 

Since  S^,  S2  are  known  from  the  samples,  the  expression  on 
the  right  of  (2.3)  has  a  known  distribution  depending  only 
on  the  distribution  of  t-^  and 

Fisher  extended  Behrens'  solution  and  derived  the 
solution  using  the  fiducial  argument. 
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The  principle  of  the  fiducial  argument  is  illustrated 
in  Fisher  (1935)  by  applying  it  to  the  Student's  t  solution. 


7 


If  a  sample  of  n  observations  x,  .....  x.  is  drawn 

1  n 

from  a  normal  population  with  mean  y^  and  if  from  the  sample 
we  calculate  two  statistics. 


x  = 


n 

E 

i  =  1 


n 


x . 

l 


-x  2 


S  = 


n 


]C  (x.  -  x) 
i  =  1  1 

n  (n  -  1) 


then  t  =  (x  -  y )  is  distributed  as  t  with  n  degrees  of 


S 

freedom.  Now  if  p  is  the  probability  that  t  >  t^ ,  then  p 
is  also  the  probability  that  y  <  x  -  St^.  The  probability 
is  known  for  all  values  of  t^.  Hence  the  probability  that 
y  is  less  than  any  assigned  value  is  also  known.  In  other 
words,  the  probability  distribution  of  y  is  known.  The 
probability  distribution  of  y  derived  in  this  manner  is 
termed  the  fiducial  distribution  of  y.  This  fiducial  argu¬ 
ment  has  never  been  fully  accepted  (or  perhaps  understood) 
by  statisticians,  largely  because  y  is  a  parameter  assumed 
to  be  constant  (if  unknown)  and  not  a  random  variable  with 
a  probability  distribution. 

Fisher  (1935)  then  goes  on  to  the  problem  of  the 
difference  between  two  means.  He  let  x^  -  x^  =  Y  and 

yx  -  y2  =  5.  Then  e  =  y  -  6  =  ^2^2  ~  Sltl  (2.4). 

Let  tan  0  =  Dividing  both  sides  of  (2.4)  by 


2 


2 


+  S 


,  we  get 


8 


Sltl 


2  2 

+  s22 


Define  x±  -  x2  -  y  +  y2 


Therefore  d  =  t9cos0  -  t  sin0 

^  1 


(2.5) . 


Now,  Fisher's  fiducial  approach  starts  with  the  fact 

that  t^  and  t2  are  independent  Student  variates,  y^  and  y2 

-  -  2  2 

are  regarded  as  random  variables  and  ,  x2 ,  ,  S2  are 

regarded  as  known  and  fixed  once  a  pair  of  samples  are  taken. 
In  this  manner,  0  is  known  and  d  becomes  a  linear  combina¬ 
tion  of  two  independent  Student  variates  and  thus  has  a 
known  distribution. 

When  n^  and  n2  are  both  increased,  the  distribution 
of  d  tends  to  be  normal  and  independent  of  0.  When  0  =  0°  or 
90°,  the  distribution  of  d  is  of  Student's  form,  with  n2  or 
n^  degrees  of  freedom  respectively.  In  general  d  involves 
n^ ,  n2  and  0.  Thus  to  tabulate  values  of  d  for  any  chosen 
probability,  we  would  require  a  table  of  triple  entry.  Since 
critical  values  of  d  for  any  a  can  be  calculated,  we  can  use 
d  to  test  the  hypothesis  that  y^  -  y2  =  0. 

A  table  of  the  significant  values  of  d  as  defined  in 
(2.5)  was  calculated  by  Sukhatme  (1938).  The  table  covers 
values  of  even  degrees  of  freedom  n^  and  n2  at  6,  8,  12,  24 
and  °°,  and  values  of  0  at  0°,  15°,  30°,  45°,  60°,  75°,  90°. 


9 


Fisher  (1941)  gave  asymptotic  expansions  for  cal¬ 
culating  the  probabilities  of  d  and  the  critical  values  of 
d  for  specified  values  of  a.  He  also  gave  a  table  for  the 
case  when  n^  or  n ^  is  large. 

Fisher  and  Healy  (1956)  calculated  the  exact  values 
of  d  for  small  odd  degrees  of  freedom  1,  3,  5,  7  for  a  = 

.10,  .05,  .02,  .01.  All  the  three  tables  are  reprinted  in 
Fisher  and  Yates  (1963)  . 

II .  Welch  Approximate  Degrees  of  Freedom  (APDF)  Solution 

] 

Define  n,  ,  n~  to  be  sample  sizes,  X.  =  —  ,  f.  =  n.  -  1, 
1'  2  ^  '  l  n.  l  l 

l 

i  =  1,2.  Welch  (1938)  considered  two  criteria, 


10 


When  y 1  =  y  , 


X  , 


also 


fisi 


a. 


=  X- 


f2S2 


a, 


=  X 


'2  2  2  2 
where  x  '  Xy  >  X2  are  tributed  as  x  with  1,  f^, 

degrees  of  freedom  respectively. 


It  is  possible  to  write  u  and  v  in  the  form 


y 


r 


X 

2  ~~2 

axx  +  bx2 


(2.6) 


t  9 

where  a  and  b  are  constants  depending  on  the  n  S  and  o  S 

9 

and  w  is  distributed  independently  of  x  • 

The  distribution  of  w  is  then  approximated  by  the 
Pearson  Type  III  curve  with 


P  (w) 


f/2  -  1 


_ w _ _ 

(2g) f/2.ew/2g.r (f/2) 


(2.7) 


where  f  and  g  are  chosen  so  that  the  first  two  moments  of  the 
curve  agree  with  the  true  moments  of  w.  The  values  of  f  and 
g  are  found  to  be 


fx  +  b  f2)2 
a2fx  +  b2f2 


2  2 
a  r.  +  bzf9 

- - - -  (2.8) 

afl  +  bf2 


—  is  distributed  approximately  as  chi 


square  with  f  degrees 


■--at  m  o  a  ai.i  -m  i  *  >  -w«o»fe  1  c 
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of  freedom.  Hence  x  (w/fg)  =  t 


(2.9) 


is  distributed  approximately  as  t  with  f  degrees  of  free¬ 
dom.  Therefore,  from  (2.6)  we  have  y  =  c  tf,  with 


c  =  (fg) 


-  1/2  _ 


-  1/2 


-  (af  +  bf?)  J'/  * ,  and  f  is  given  by  (2.8). 


Now  u  and  v  are  of  the  form  (2.9)  and  a  and  b  for  the  two 
criteria  can  be  found.  Hence  c  and  f  can  be  found.  For  in¬ 


stance,  for  v, 


c  =  1  and 


f  = 


(Xiai 


2  2 

X2a2  } 


,24  ,24 

Af  a1  A2  a2 


Since  are  not  known,  Welch  gave  an  estimate  of  f  as 


(X1S12  +  X2S22)2 


-  2 


( 


f  = 


A  2S  4 
A1  S1 

n^  +  1 


+ 


X22s24 

n2  +  1 


) 


hV 

n^  +  1 


+ 


X22s24 

n2  +  1 


Welch  also  stated  that  when  it  can  be  assumed  that 
o ^  =  a2,  then  u  can  be  used.  But  if  /  a^,  u  would  be 
biased  and  it  is  better  to  use  v. 

This  solution  has  the  obvious  advantage  that  it  only 
involves  referencing  the  criterion  u  or  v  to  the  Student's 
t  table  which  is  readily  available,  with  degrees  of  freedom 
given  by  f. 


Ill .  Welch-Aspin  Solution 

Welch  (1947)  developed  an  approximate  series  solution 
for  the  Behrens-Fisher  problem.  He  was  concerned  with  finding 
a  quantity  h,  calculated  from  the  observed  variances  and  de¬ 
pending  on  the  size  of  the  test  p,  such  that 


■  -  3 
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prM  xi  "  x2  "  (yl  “  f  h^Si2'  s2  '  P)  1  =  P  (2.10) 

The  solution  is  developed  generally  for  k  populations, 

with  the  Behrens-Fisher  problem  a  particular  case  at  k  =  2 . 

Let  us  describe  briefly  the  solution  for  the  case  k  =  2. 

Again,  let  n^ ,  n 2  be  the  sample  sizes,  A^  =  i  , 

2  2  1  .  . 

f^=n^-l,  i=l,2.  Let  j (S^  ,  ,  p)  be  the  probability 

_  2  2 
that  (x^  -  x2)  -  -  m2)  is  less  than  h(S^  ,  S2  ,  p)  given 

2  2  - 

S ^  and  S2  .  Now  since  x^  -  x2  is  distributed  independently 
2  2 

of  and  S2  ,  we  have 


j(s12,s22,p)  = 


,  2  „  2  ,  2  ,  ,  2 .  -1/2 
,h(S1  ,S2  ,p)  (A1a1  +  A 2^2  ) 


,-1/2  -u  /2 
(2tt)  e 


du 


To  satisfy  the  condition  of  (2.10),  we  simply  average 
2  2 

j  (S^  , S 2  ,p)  over  the  known  probability  distribution  of 
and  the  result  will  equal  p.  So 


/  /j(s  2,s22,  p)  n  P(si2)d  si: 

s.2  1  =  1 

1 


=  p 


(2.11) 


Welch  then  proceeded  to  develop  a  series  expansion  for 
h(S12,S22,P)  using  the  relation  (2.11).  The  solution  de¬ 


veloped  (to  the  order  1  )  is: 

f.2 

i 


h(S12,S22,p)  =  5 


(1  +  5) 


(?^) 


:  lb  >  c  2  1  +  4  /  2  X.S.2  \2 

iSi  (e  ) 


E 

1 


,  r  c  \  fl 


,x  torti-.  '  o 
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where  £  is  the  standard  normal  deviate  such  that 


(2tt  ) 


1/2 


u2/2 


du 


P 


Aspin  (1948,1949)  extended  the  series  expansion  for 
the  Welch  test  and  tabulated  values  of  the  v  statistic  as  a 
function  of  the  observed  variance  ratio 


C  = 


xisi' 


^1S1  +  ^2S2 


Two-sided  .10  and  .02  critical  values  or  one-sided  .05  and 
.01  values  are  available  at  f^,  f ^  ~  6 ,  8  ,  10  ,  15  ,  20  ,  00 . 
James,  Trickett  and  Welch  (1954,1956)  calculated  two-sided 
.05  and  .01  or  one-sided  .025  and  .005  critical  values  of 
the  v  statistic  also  in  terms  of  C  and  at  the  same  degrees  of 
freedom.  Some  of  these  tables  are  reprinted  in  Pearson  and 
Hartley  (1954) . 


14 


/ 

IV.  Scheffe  's  Solution 


An  exact  confidence  solution  was  given  by  Scheffe 

(1943)  with  the  use  of  the  t  distribution.  Let  (x^ ,  x2, 

. . . ,  x  )  and  (y ,  . . . ,  y  )  be  random  samples  from  normal 
nl  n2 

populations.  Assume  n^  4  The  solution  is 


X1  -  x2  -  +  u 


=  t  (f  ) 
a  n1 


domly  selected  subset  of  n-^  of  the  n2  values  in  the  second 
sample . 

1 

Scheffe  (1970)  in  a  survey  of  various  solutions  to  the 
problem,  said  that  his  solution  is  impractical  and  thus  he 
does  not  recommend  its  use  because  the  calculation  of  S^  in¬ 
volves  putting  in  random  order  the  elements  of  the  larger 

sample  (y^,  •••>  Yn  )•  The  value  of  S^  and  hence  the  value 

2 

of  the  test  statistic  (x^  ”  x2  ~  ^1  +  ^ePen<^s  verY 

much  on  the  result  of  this  randomization. 

There  are  other  solutions  proposed. 


McCullough,  Gurland  and  Rosenberg  (1960)  proposed  a 


solution  using  the  statistic 


Y(rl'r2) 


where  E . 

1 


1 


1,2. 


r^,  r^  are  constants  appropriately  chosen  to  control 


the  size  of  the  test  and  the  critical  value  for  the  test  is 
a  constant  equal  to  1.  That  is,  in  testing  the  hypothesis 
y i  =  ^2'  t^ie  hypothesis  is  rejected  when  >  1. 

Jeffreys  (1940)  developed  a  Bayesian  solution  and 
arrived  at  the  same  distribution  as  the  Behrens-Fisher 
solution.  An  approximation  to  the  d  distribution  by  the 
Student's  t  distribution  has  been  proposed  by  Patil  (1964). 
Box  and  Tiao  (1973)  studied  the  approximation  at  one  set  of 
(nx ,  n2 ,  0) . 


At  the  same  time  that  various  solutions  were  put  for¬ 


ward,  many  discussions  and  criticisms  on  these  solutions 
arose. 


On  the  Behrens-Fisher  solution,  Welch  (1956)  disagreed 


with  the  fact  that  s^/s2  fi-xed*  He  was  °f  the  opinion 

2  2 

that  the  right  approach  was  to  average  over  and  S2  as 

he  did  in  his  series  solution. 

Others  (Bartlett  (1936)  ,  Neyman  (1941)  )  based  their 
arguments  on  the  confidence  interval  approach  and  noted  that 
in  repeated  sampling  from  populations  with  a  fixed  true 
variance  ratio,  the  Behrens-Fisher  test  would  not  reject  the 


•  -  C  r.lf  5  ICXJ  -m  :■  q  J  ■ 
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null  hypothesis  with  a  frequency  equal  to  the  specified  size. 
This,  they  say,  shows  some  defect  in  the  solution  since  in 
the  confidence  interval  approach  this  requirement  should  be 
satisfied  for  a  test  of  hypothesis. 

In  answer  to  this,  Fisher  (1939,1961)  clarified  the 
hypothesis  on  which  his  test  was  based.  This  will  be  dis¬ 
cussed  in  detail  in  the  next  chapter.  Yates  (1939)  also 
pointed  out  that  the  apparent  inconsistency  of  the  solution 
is  due  to  an  insufficient  appreciation  of  the  fiducial  basis 
of  the  solution. 

Wilks  (1940)  stated  that  it  can  be  shown  that  there 

-  -  2  2 

exists  no  function  of  x^ ,  x^,  ^  S2  '  ^il""1J2  ^-n(^ePen<^ent  of 

o ^  and  a ^  having  its  probability  law  independent  of  the  four 

population  parameters.  Hence  it  is  impossible  to  obtain 

exact  confidence  limits  for  y — y 2  corresponding  to  a  given 

confidence  coefficient.  However,  no  proof  has  been  published. 

Bennett  and  Hsu  (1961)  conducted  a  sampling  study  on 

the  power  of  the  Behrens-Fisher  and  the  Welch-Aspin  test. 

In  the  study,  random  normal  samples  ^  (i^  =  ...,  n^ , 

k  =  1,  ...,  100),  and  y.  ,  (j_  =  1,  •••,  n 9,  k=  1,  ..., 

H2k  2  z 

100)  were  generated.  For  each  selected  pair  (n2'n2^  and 
assigned  value  of  the  parameters  y^,  o \i2t  ®  2’  rat^° 


2k 


n. 


(k  =  1, 


.  •  •  , 


100) 


d  5  j  o  •  ■ 


»  (  Idei,  1  e. I  i r-  f c  -  ;  ci  :!•  o  19V  51 
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-  -  2  2 

was  computed  for  the  100  samples,  where  x,  ,  y,  ,  S  - ,  ,  S %, 

are  the  sample  means  and  variances.  The  power  of  this  test 

is  the  relative  frequency  of  the  event  )  v,  <  -  v  |  y  <  y0 

(  2 

where  e  is  the  specified  size.  Assigned  values  of  y^,y2  are 
y^  =  0,  y^  =  0,  0.5,  1.0,  2.0,  3.0,  and  the  interpretation 
of  the  significance  or  nonsignificance  of  each  was  deter¬ 
mined  by  the  use  of  the  Fisher  and  Yates  tables  and  the 
Pearson-Hartley  tables  respectively  for  the  two  tests. 

The  results  show  that  for  smaller  values  of  n^  and 
n2  the  Behrens-Fisher  test  showed  a  smaller  empirical  size 
(power  of  the  test  at  y^  =  y2)  than  the  Welch  test.  The 
power  of  the  Welch  test  is  greater  than  the  3ehrens-Fisher 
test  over  the  whole  range  of  y^  -  y2* 

Yates  (1964)  reviewed  some  of  the  concepts  of  the 
fiducial  theory  and  the  arguments  for  and  against  the  Behrens- 
Fisher  solution.  He  pointed  out  the  importance  of  recog¬ 
nizing  the  proper  reference  set  of  a  test  of  significance. 

Mehta  and  Srinivasan  (1970)  and  Wang  (1971)  also 
calculated  the  actual  size  of  the  Behrens-Fisher  solution 
together  with  other  solutions.  The  actual  size  of  the  test 
at  various  fixed  true  variance  ratios  and  selected  (n^,  n^) 
is  found  to  be  lower  than  the  nominal  size. 

Thus  proceeding  along  different  lines  of  reasoning, 

f 

Fisher,  Welch,  Scheffe  etc.,  arrived  at  different  solutions 
to  the  same  problem.  It  appears  that  the  Behrens-Fisher 
solution  is  the  most  often  discussed  since  Behrens  (1929) 


and  Fisher  (1935)  first  put  it  forward  and  Sukhatme  (1938) 
published  tables  for  using  the  test.  We  are  drawn  to  this 
solution  ourselves  and  will  examine  it  in  more  detail  in 


this  study. 


■  f'o 


CHAPTER  III.  VERIFICATION  OF  THE  BEHRENS -FIS HER  TEST 


In  reply  to  the  criticisms  to  the  Behrens-Fisher 
solution,  Fisher  (1939,  1961)  clarified  the  hypothesis  on 
which  the  test  was  based  and  gave  methods  of  verification 
of  the  test  based  on  the  hypothesis.  The  important  points 
in  Fisher  (1939,1961)  are  given  below. 

A.  Fisher  (1939) 

2 

I.  The  distribution  of  the  d  statistic  is  expressed 
2 

in  terms  of  t  where  t  has  the  Student's  distribution  with 
nl  +  n2  <^e9rees  °f  freedom.  The  probability  with  which  the 
tabulated  values  of  d  are  exceeded  by  the  means  of  samples 
from  populations  having  the  same  mean  can  then  be  calculated 
using  this  relationship  with  t. 

n^  +  1 

s  2  =  1  =  1 _ 

1  nl^nl  + 

n2  +  1 

Z  <X2i  '  52)2 

s  2  =  *  =  1 _ 

2  n2 (n2  +  D  ' 

v, ,  v9  to  be  the  true  variances  of  the  population  means,  and 
2  2 

a. L  ,  a2  the  true  population  variances. 

Suppose  the  pair  of  samples  come  from  populations 
having  the  same  mean.  Therefore  we  have 

x^  -  x2  ~  N (  0 ,  v^  +  v2)  or 


Define 


E 


(X 


li 
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20 


X1  X2 


v 


V1  +  V2 


N(0,1)  • 


Since  n]_S]_ 


v. 


n,  df  ' 


n2S2 


X  n2df  and 


-  2 

?  (x  -  x9)  /(v,  +  v0) 
Define  t  =  — - - 


( 


nisi' 

v. 


+ 


n2S2 


(nl  +  n2 } 


(3.1) 


Then  t  has  the  Student's  t  distribution  with  n^  +  n^  degrees 


of  freedom.  Let  w  = 


v. 


and  d  = 


X1  X2 


as  before. 


S1  +  S22 


V- 


Multiplying  the  right  hand  side  of  (3.1)  by  —  we  get 

V1 


2  =  d2(Sl2  +  S22)  (i^  +  n2) 

(i  +  i)  (n1S12  +  n2S22w) 


(3.2) 


Expression  (3.2)  clearly  involves  w  which  is  unknown.  How¬ 


ever,  w  varies  according  to  a  certain  probability  law.  Let 
2  2 

F  =  S^  V2/(S2  V^) .  F  has  the  Snedecor's  F  distribution  with 

n^  and  n 2  degrees  of  freedom.  Hence,  the  random  variable 
1  1  2  2-1 

z  =  2  log  F  =  2  l°g  (S^  w)  has  the  known  z  distrib- 

2  2 

ution  with  n^  and  n2  degrees  of  freedom.  With  S^  /S2  known 


- 
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2  2  2 

from  the  sample,  w  is  distributed  as  /(S2  eZZ).  The 
probability  with  which  a  particular  tabulated  d  value  will 
be  exceeded  is  then  the  average  value  of  the  probabilities 
with  which  it  will  be  exceeded  for  various  possible  values 
of  w. 


In  short,  the  procedure  is  as  follows:  In  order  to 
obtain  a  good  spread  of  the  values  of  w,  let  p^  be  a  mem¬ 
ber  of  a  series  of  uniformly  spaced  fractions,  such  as 


3 

32  ' 

31 

.  . .  ,  32. 

Carry  out 

steps  a-d  for 

each  p . . 

a. 

Calculate 

w. 

b . 

Calculate 

( t  ) /d  using  (3.2) . 

Pi 

c. 

Calculate 

t  using 

Pi 

the  d  value 

a/2 

from  Sukhatme's 

table . 

d. 

Calculate 

p(t  >  t  > 

=  A.  This  is 

the  probability 

with 

which  t  will  be  exceeded. 
pi 

e.  Take  the  average  of  the  values  of  A  calculated  from 
(d) .  The  average  of  these  values  should  be  a/2. 

Fisher  illustrated  this  procedure  with  an  example. 

The  case  he  considered  was  n^  =  =  6 ,  =  S2  ,  ot/ 2  =  .  025. 


Values  of  p.  = 


,  . . . ,  etc.  were  used.  The  probability 


32'  32 

with  which  the  tabulated  d  value  is  exceeded  is  calculated 
to  be  .0238.  It  is  suggested  that  a  finer  graduation  would 
increase  the  contribution  of  the  tails  and  yield  a  value 
closer  to  .025. 

II.  Values  of  d  appropriate  to  different  suppositions 


on  the  true  variances  are  given. 


b  bo3&Lu ds*  ^c£uoxi^>q  £  rioiriw  n*  w  y^  lidedcrcq 
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(1) 


Consider  three  cases: 

2  2 

true  variances  are  equal  (a^  =  a 2  ) 


That  is , 


w  = 


n2  +  1 
n^  +  1 


From  (3.2)  we  have 


,2  _  (1  +  (nlSl2  +  «2S22w)t2 

^  2  2 
(nl  +  n2)  +  S2  ) 


Substituting  n2  +  1  into  the  above  formula, 

w  = 


nl  +  1 


we  get 


2  (n1  +  n2  +  2)  (n1  (n^^  +  DS^  +  n2  (n2  +  1)S22)  2 

( n1  +  n2)  (n^  +  1)  (n2  +  1)  (S^  +  S22) 


2  2 

Note  that  d  =  t  when  n^  =  n2» 


(3.3) 


2.145. 


If  n^  =  6,  n2  =  8,  the  .05  value  of  t  for  14  df  is 


2  2  1 

Let  S ^  /S2  equal  3,  1 ,  -j  successively,  we  get 


from  (3.3),  d  =  2.033,  2.109,  2.320 


(2)  w  is  exactly  equal  to  the  sample  variance  ratio.  That 

2  2 

is,  w  =  /S2  .  From  (3.2)  we  have 


d2(n1  +  n2)  (S^  +  S22) 

4) 


•v  l  )  £>*'tT< 


r  t- 


— —  ~~ 
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d2  (r^  +  n2)  (S12  +  S22) 


9  9 

(3)  w  differs  from  the  sample  variance  ratio  S1  /s2  bY 

sampling  errors  given  by  the  z  distribution.  In  this 


case  the 

d  values  are 

those  given 

in  Sukhatme's 

table. 

At 

a  =  .05  the 

values  of  d 

appropriate  to 

the 

three  cases  are: 

2  2 
S1  /S2 

3 

1 

1/3 

Case  1 

2.033 

2.109 

2.320 

Case  2 

2.145 

2.145 

2.145 

Case  3 

2.398 

2.364 

2.332 

From  these  two  points  in  the  paper,  it  is  now  clear 
that  the  Behrens-Fisher  test  was  based  on  the  hypothesis 
that  the  true  variance  ratio  w  is  not  fixed  nor  equal  to  the 
sample  variance  ratio  but  varies  according  to  a  certain 
probability  law.  Moreover,  from  the  second  point,  by 
assuming  otherwise,  one  is  in  fact  dealing  with  values  of  d 
other  than  those  in  Sukhatme's  table. 

A  further  note  is  that  Fisher  (1956)  derived  the  exact 
sampling  distribution  of  d  as 

^2  (1  +  e2z  cot20)  (n^  +  e  2z) 

_  =  2 

t  (n^  +  n2)  cosec  0 


*  '-<•  s3  5  •  t  ►  i9fL  o 


24 


which  is  the  same  as  (3.2)  when  /S2  e  z  and  tan  0  are 
substituted  for  w  and  S^/S 2  in  (3.2). 

B.  Fisher  (1961) 

Fisher  (1961)  outlined  a  method  of  verification  using 
random  sampling.  He  wrote  that  the  first  step  in  under¬ 
standing  a  test  of  significance  and  hence  in  setting  up  a 
process  of  verification  of  the  test  is  the  recognition  of  the 
appropriate  reference  set  of  the  test.  (Reference  set  was 
defined  in  Fisher  (1959)  as  the  population  for  which  prob¬ 
ability  statements  of  the  test  are  made) .  The  reference  set 
in  this  test  is  characterized  by  the  known  values  n^ ,  n 2  and 
Sl//^2*  Hence  to  set  up  a  sampling  process  to  verify  the 
tabulated  d  values  of  the  Behrens-Fisher  test,  the  first 
step  is  to  be  able  to  obtain  random  samples  of  sizes  n^  +  1, 
n2  +  1  respectively  and  having  the  correct  value  of 

The  method  of  verification  has  the  same  approach  as 
the  one  in  Fisher  (1939)  in  the  sense  that  the  true  variance 
ratio  is  also  not  assumed  fixed  based  on  the  hypothesis  by 
which  the  test  was  furnished.  We  outline  the  method  in  the 
following  steps: 

(1)  Let  p^  be  a  series  of  fractions,  such  as 


( 2i  -  1) 
pi  20,000 

(2)  If  ax2,  a22 


(i  =  1,2,. . .  ,10000) 


are  as  defined  on  p.19,  the  distribution 
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of  z  =  ±  log 


(r^  +  l)sl2a22 

- - — ^ ~  is  known  in  terms  of  n..  and 

(n2  +  DS 2  a1  -1 


n9.  Let  z  stand  for  the  value  such  that  P(z  <  z  )  =  p.. 

z  p  p  i 

Calculate  z  for  each  p . .  Therefore  for  each  i ,  i  =  1 , 
Fi  1 

10000,  the  true  variance  ratio 


<n2  +  Ds2 


(n^  +  1)S1 


exp (  2zpi  ) 


is  known. 

(3)  Take  samples  of  sizes  n^  +  1,  n2  +  1  respectively  from 
two  normal  populations  having  equal  means  and  variances  in 
the  ratio  calculated  on  step  2. 

Calculate  S^/S2  of  the  samples  taken.  Reject  the 
pair  of  samples  if  the  ratio  Sp/S2  does  not  agree  within  a 
specified  tolerance,  with  the  given  value  of  S^/S2*  For  each 
value  of  i,  the  first  that  satisfies  this  condition  is  taken 
as  a  representative  sample  of  the  reference  set. 

(4)  To  verify  the  tabulated  d  value  corresponding  to  the 

P 

given  n^,  n2 ,  find  the  number  of  samples  falling  in 

the  three  regions: 


X..  -  x~  <  -  d 

12  p 


\ 


Sl2  +  S22 


-  d 


Sl2  +  S22  -  ^X1  ”  x2*  -  dp 


Sl2  +  S22 


dp  ^S1  +  S2  <  (x1  x2) 


where  x. ,  x2  ,  Sp,  S2  are  calculated  from  the  pair  of  samples 
taken.  The  expected  numbers  in  these  three  regions  are 


■ 


26 


250  ,  9  500  ,  250  for  the  a  =  .05  point  of  d  and  50  ,  9900  ,  50 
for  the  .01  point  of  d. 

C.  Conclusion 

The  Behrens-Fisher  solution,  arising  from  the  fiducial 
argument,  has  the  following  suppositions: 

1.  The  observed  means  x^  ,  x ^  and  observed  variance 

ratio  are  regarded  as  known  quantities  once 

a  pair  of  samples  are  taken  and  leads  to  the 
solution . 

2.  The  unknown  true  variance  ratio  w  is  distributed 

2  2  2  z 

as  /S2  e  where  z  has  a  known  distribution. 

Fisher  (1939,1961)  emphasized  that  a  proper  verific¬ 
ation  of  his  test  consist  of  an  understanding  of  the  hypo¬ 
thesis  underlying  the  test.  From  the  literature  that  we 
have  covered,  it  is  clear  that  most  of  the  criticisms  of  the 
test  are  that  the  actual  size  of  the  test  is  lower  than  the 
nominal  size.  However,  no  mention  was  ever  made  of  the 
clarification  and  verification  procedure  of  the  test  in  the 
two  papers  discussed  above. 

In  this  present  study,  we  do  not  intend  to  enter  into 
an  involved  discussion  of  the  concept  of  fiducial  inference. 

t 

The  merit  of  a  test  of  significance  is  usually  measured  by 
the  closeness  of  its  actual  size  to  the  nominal  size  and  its 
power  performance.  Since  these  are  what  most  of  the  critics 
of  the  test  are  concerned  with,  we  decided  to  conduct  a 
verification  of  the  test  through  the  calculation  of  the 
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actual  size  and  its  power. 

A  proper  verification  of  the  Behrens-Fisher  test 
based  on  its  suppositions  is  contained  in  Fisher  (1939, 
1961).  In  the  present  research,  we  will  be  mainly  concerned 
with  a  sampling  study  on  the  actual  size  of  the  test  based 
on  Fisher  (1961) . 

A  description  of  the  implementation  of  the  verific¬ 
ation  and  a  tabulation  and  discussion  of  the  results  on 
actual  size  are  presented  in  Sections  A  and  C  in  the  next 
chapter.  Actual  size  calculated  using  (3.2)  is  also  given 
in  Section  C.  Section  D  contains  calculation  of  the  power 
of  the  test. 


CHAPTER  IV.  IMPLEMENTATION  OF  FISHER'S 


PROPOSED  VERIFICATION  OF  THE  BEHRENS -FISHER  TEST 

A.  Description  of  Algorithm 

We  describe  below  an  algorithm  to  implement  the  verif¬ 
ication  of  the  Behrens-Fisher  test  proposed  by  Fisher  (1961). 

Step  1.  Calculate  z  for  each  p. ,  where  p.  is  a  series  of 

Pi  11 

fractions  (p.  =  2i  -  1,  i  =  1,  2,  .  ..,  N)  and  z  has  the 

1  2N  pi 

known  z  distribution  with  n^ ,  n^  degrees  of  freiedom. 

Calculation  of  z  involves  the  standard  normal 

deviate  corresponding  to  the  same  p^.  Two  approximations 

are  used  in  order  to  calculate  z  for  each  p. . 

Pi 

1.  An  approximation  to  calculate  a  standard  normal 
deviate  y^  satisfying 


Pi  ’  ^ 


f 


e  t  7/2  dt  =  Q(y  )  . 

It 


That  is,  calculate  y^  such  that  Q(Yp)  is  the  upper  tail  of  a 
standard  normal  cumulative  distribution  function.  He  use  the 
inverse  normal  approximation  due  to  Hastings  (1955).  This 


method  gives  a  y^  which  approximates  y^  satisfying 


for  0  <  p  <  0 . 5 

(4.1) 


The  value  y  which  satisfies  equation  (4.1),  in  terms  of  the 
P 

approximation  and  associated  error,  is 
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where  e 


CQ  +  C2  ^  +  c2 ^ 


1  +  d^t  + 

2  3  '  ~ 

d2t  +  d3t 

.5  x  10  ^ . 

cq  =  2.515517 

d±  =  1.432788 

c±  =  0. 802853 

d2  =  0.189269 

c2  =  0.010328 

d3  =  0.001308 

and  t  =  (-2  In  p) 


1/2 


For  p  >  0.5,  use  1  -  p  and  the  y  we  want  is  -y  . 

P  P 

2.  An  approximation  to  calculate  zp  such  that 

P  ( z  <  z  )  =  p .  . 

p  i 

The  approximation  is  due  to  Cornish  and  Fisher 

(1937).  The  value  of  z  corresponding  to  a  probability  a, 

P 

that  is  P(z  >  z  )  =  a,  is  expressed  in  terms  of  a  standard 
normal  deviate  £  corresponding  to  the  same  probability. 

We  have 

zp  =  £(a/2)1/2  -  |6(C2  +  2)  +  (a/2 ) 1/2  j(£3  +  30a/24 
+  (£3  +  11 0  <5 2/a72 j  -  (£4  +  9£2  +  8)  6a/120 
+  ( 3£4  +  7 £2  -  16)  £ 3/32 40a  +  (a/2)1/2  j(£5  +  20£3  + 

150  (a2/1920 )  +  ((£5  +  44£3  +  1830  64/2880)  + 
(9£5  -  2  84£  3  -  15130  64/155520a2' 


where 


a  =  n^  +  n2  d, 


x  ~  1  -1 

6  =  nl  -  n2 


Therefore,  to  calculate  z  for  each  p^  such  that 


p(z  <  z  )  =  p . ,  we  do  the  following 
^i 
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(1)  Compute  p_^  for  i  =  1,  .  .  N,  where  N  is  the 
number  of  pairs  of  samples  generated. 

(2)  Let  =  1  -  p^  so  that  q.  =  P(z  >  z  ). 

Pi 

(3)  Calculate  y  such  that  Q (y  )  =  q . ,  using  the 

p  pi 

first  approximation  as  described  previously. 


(4)  Calculate  z  such  that  P(z  >  z  )  =  q.  using 

Pi  pi  1 

the  second  approximation  with  E  =  v  . 

P 

Hence  we  have  z  for  i  =  1,  . ..,  N  such  that 


P  (z  >  z^  )  =  q  ..  or  P  ( z  <  z  )  =  1  -  q .  =  p . 

Pi  11 


Pi- 


Step  2.  Calculate  the  true  variance  ratio  with  n^ , 


2  2 

S1  ^S2  known  for  each  i  . 


True  variance  ratio  K 


<n2  +  1)S22 

(r^  +  lis^ 


Step  3.  Generate  random  samples  from  two  normal  populations 

with  equal  means  and  with  variances  in  the  ratio  K  just 

calculated  in  Step  2.  There  are  two  ways  of  generating 

random  samples  from  the  two  specified  distributions.  Assume 

2  2  ? 

first  that  y^  =  y2  =  0,  =1,  and  so  =  kcf^  =  K  and 

2  2 

S^  /S^  =  C  is  known. 

a.  The  first  way  is  to  generate  samples  x^'s  °f  size 

n^  +  1  from  a  normal  population  with  mean  zero  and  unit 

variance.  Next  generate  samples  x^'s  °f  size  n2  +  1  from  a 

2 

normal  population  with  zero  mean  and  variance  a 2  .  Com- 

-  -  2  2 
pute  sample  means  x^  and  x2  and  sample  variances  S^  and  S2  . 
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b.  The  second  way  is  a  much  faster  way  and  is  based 
on  the  independence  of  the  numerator  and  the  denominator  of 
the  d  statistic.  It  is  adapted  from  the  master's  thesis  by 
West  (1967).  This  method  is  used  in  this  research.  The  d 
statistic  is  defined  as 


X1  '  X2  ”  U1  +  U2 


(i)  Consider  the  numerator  x^  -  -  y^  +  1^ 


From  Chapter  1,  we  have  x-^  -  ~  N(y^  - 

2  2 

°1  °2 
_ i _  4-  -  ) 

n1  +  1  n2  +  1  * 

2  2  2 

Again  set  y^  =  0 ,  =  1 ,  a ^  =  Ka^  =  K  , 

then  -  x2  ~  N(-p2,  +  >  ' 

Hence,  for  the  numerator,  we  generate  a  normal  random 
deviate  with  specified  mean  an<^  variance 

_ i_  +  _ *_  . 

n^  +  1  n^  +  1 


(ii)  Consider  the  denominator 


+ 


with 


2 


nl ( ni  +  1) 


where  each  x 


li 


N  (y  !  '°  1 


then  it  is  known  that 
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n1  (n 


1  +  Us 

2 


2 

has  a  x  distribution  with  degrees  of  freedom.  Thus 

to  generate  random  numbers  from  the  distributions  of 

2  2  2 
S ^  and  S2  /  we  can  generate  samples  from  x  distrib¬ 
utions  with  n^  and  n^  degrees  of  freedom  respectively, 

2 

since  generating  from  the  x  distribution  is  much 
easier.  Then  since 


nl^nl  +  1^Si  .  2  n2  ^n2  +  1^S2 

- 2 -  13  X  n.df  and  - 2 - 

a,  1  a  ^ 


2 


13  X  n2df. 


then  if  the  two  chi  square  numbers  generated  are 
and  y2^,  by  the  transformations  1/n^  (n^  +  1)  ,  and 


K/n2 (n2  +  1) , 


Y 


li 


nl (ni  +  U 


and 


y2i  K 


n2(n2  +  1) 


are  then  random  observations  from  distributions  of 
2 

and  S2  respectively. 


Step  4. 


Calculate  S 


Accept  the  pair  of  samples 


if 


where  e  is  the  specified  tolerance  limit,  otherwise  generate 

2  2 

another  observed  value  for  /S2  . 


. 


. 
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Step  5.  If  d  is  the  tabular  value  to  be  verified,  count 

IT 

the  number  of  observations  falling  in  the  regions 


The  total  number  of  observations  falling  in  the  first  and 
third  regions  divided  by  N  gives  the  actual  size  of  the 
two  tailed  test. 

B .  Pseudo-Random  Number  Generation 

Generation  of  random  numbers  from  certain  distrib¬ 
utions,  for  example,  a  normal  distribution  with  specified 
mean  and  variance,  or  a  chi  square  distribution  with  n  de¬ 
grees  of  freedom,  requires  first  of  all  generation  of  random 
numbers  from  a  U(0,1)  distribution.  Random  numbers  from 
various  distributions  are  then  obtained  by  doing  some  trans¬ 
formation  to  the  U(0,1)  random  numbers.  Hence,  a  good  random 
number  generator  to  generate  U(0,1)  numbers  is  a  basic  re¬ 
quirement  for  most  sampling  studies. 

In  this  study,  we  need  to  generate  random  numbers 
from  normal  and  chi  square  distributions.  This  involves 
generation  of  random  numbers  from  a  U(0,1)  distribution. 
Moreover,  generation  of  a  single  chi  square  random  number 
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often  requires  more  than  one  U(0,1)  number,  the  number  re¬ 
quired  being  dependent  on  the  degrees  of  freedom  of  the 
chi  square  distribution.  Also,  in  our  algorithm,  very  large 
numbers  of  random  numbers  are  needed  because  for  each  of 
the  5000  pairs  of  samples  taken,  we  have  to  keep  on  gener¬ 
ating  samples,  rejecting  those  whose  rat^°  lies  out¬ 

side  the  tolerance  limit  and  accepting  the  first  pair  of 
samples  whose  S^/S2  ratio  is  within  the  tolerance  limit. 

Thus  we  need  a  good  and  fast  uniform  random  number  generator 
to  start  with. 

By  a  fast  pseudo-random  number  generator,  we  mean  the 
the  generating  time  per  number  should  be  reasonably  short. 

By  a  good  pseudo-random  uniform  number  generator  we  mean  one 
that : 

(1)  generates  numbers  whose  distribution  approximates 
closely  the  U(0,1)  distribution. 

(2)  satisfies  various  statistical  criteria  of 
randomness . 

(3)  generates  a  sequence  of  numbers  long  enough 
before  the  numbers  repeat  themselves. 

To  satisfy  (1) ,  the  chi  square  goodness  of  fit  test 
or  the  Kolmogorov-Smirnov  test  is  usually  performed.  For 
(2) ,  one  performs  various  statistical  tests  on  the  random 
numbers.  A  detailed  discussion  on  the  various  tests  of 
randomness  is  found  in  Jansson  (1966).  (3)  involves  careful 

choice  of  constants  in  the  mathematical  equations  used  to 
generate  the  pseudo-random  numbers. 


IO 
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A  number  of  existing  generators  were  studied.  Among 
these,  some  are  designed  for  computers  other  than  the  IBM/360 
(see  Downham  and  Roberts  (1967))  .  Some  are  very  fast  but 
results  on  the  statistical  tests  on  the  generator  are  not 

Icicle .  (See  for  example,  Seraphin  (1969)  )  .  After  careful 
consideration ,  we  decided  that  the  pseudo-random  number 
generator  that  is  more  suitable  for  our  study  is  the  one  by 
Lewis,  Goodman  and  Miller  (1969). 

This  generator  employs  the  frequently  used  multiplic¬ 
ative  congruential  method.  In  this  method,  the  set  of 
numbers  x^  is  generated  by  the  equation 


xi  +  1  “  Axi  (mod  p) , 


and  the  sequence 


uniform 


random  number  sequence.  For  this  particular  generator. 


A  =  7  ^  and  p  =  2~^  -  1. 


This  generator  was  chosen  because  it  was  extensively 


tested  and  the  tests  were  done  on  the  IBM  360/67  computer. 

The  various  tests  done  were  described  and  test  results 
presented  in  the  same  paper  show  that  the  generator  is  re¬ 
markably  good.  Moreover,  an  Assembler  language  program  of 
the  generator  is  available  and  may  be  called  conveniently  in 
a  Fortran  program  by  the  statement  CALL  RANDOM  (INT,  REAL). 
Here  INT  is  any  full  word  integer  variable  which  should  be 
given  an  initial  value  and  REAL,  the  random  number  generated, 
is  a  full  word  real  variable  (single  precision) .  Since  it 
is  written  in  Assembler  language,  this  generator  is  much 
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faster  than  other  generators  written  in  Fortran  or  other 
high  level  languages.  We  have  an  upper  bound  of  approx¬ 
imately  31.2  ysec  to  call  a  random  number  in  the  360/67. 

It  is  thus  for  the  above  reasons  that  we  chose  this  gener¬ 
ator. 

Normal  Random  Number  Generation 

As  discussed  in  Section  A,  we  need  to  generate  a 
normal  random  deviate  for  the  numerator  of  the  d  statistic. 
The  method  used  is  the  widely  used  Box  and  Muller  method 
found  in  Box  and  Muller  (1958) . 

Random  normal  deviates  are  obtained  by  first  gener¬ 
ating  two  uniform  random  numbers  ,  U2  on  the  interval 
(0,1).  Then  calculate 

x^  =  (-2  In  U^)  cos  2ttU2 
x^  =  (-2  In  sin  2irU2 

(x^,  x^)  will  be  a  pair  of  independent  random  variables  from 
the  same  normal  distribution  with  mean  zero  and  unit  variance. 
This  method  is  convenient  because  are  easy  to  cal¬ 

culate.  Also,  two  random  normal  deviates  are  obtained  with 
two  U(0,1)  numbers.  It  is  exact  if  the  uniform  random 
numbers  are  accurate. 

Chi  square  Random  Number  Generation 

Chi  square  random  numbers  are  needed  to  calculate  the 
denominator  of  the  d  statistic.  We  use  two  methods  to 
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generate  the  random  numbers . 


(1)  For  any  degrees  of  freedom  n: 

It  is  easy  to  show  that,  if  U  ~  U(0,1),  then 

Y  =  -  21nU  ~  x22  - 

2 

Hence,  observations  from  a  x  distribution  with  2k  degrees 
of  freedom  can  be  generated  by  adding  together  the  k  terms 


“  21nU. ) . 
i  =  1  1 

2 

For  x  with  2k  +  1  degrees  of  freedom  we  generate  k  U(0,1) 
numbers ,  get  the  sum 


k 

(  -  21nUi) 

1 

and  add  the  square  of  a  normal  deviate  generated  by  the  Box 

and  Muller  method.  This  method  has  the  advantage  that  only 

2 

k  uniform  numbers  need  to  be  generated  for  a  x  deviate  with 
2k  degrees  of  freedom. 

(2)  For  large  degrees  of  freedom: 

For  chi  square  deviates  with  large  degrees  of  freedom, 
the  enormous  amount  of  U(0,1)  numbers  that  need  to  be  gener¬ 
ated,  and  hence  the  length  of  time  required  for  each  run 
prompted  us  to  look  for  a  shorter  method  of  generating  chi 
square  random  numbers. 

We  use  the  approximation  by  Wilson  and  Hilferty  (1931) 

2  1/3 

The  approximation  is:  for  large  n,  (x  /n) 
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is  normally  distributed  about  mean  1  -  ^ —  and  variance  \ — 

9n  9n. 

Hence,  to  generate  a  chi  square  number  with  n  degrees  of 
freedom,  we  can  first  generate  ,  where  x^  ~  N(0,1).  Then 
the  chi  squared  random  number  is 


This  method  has  the  obvious  advantage  that  only  one 
N(0,1)  number  needs  to  be  generated  for  each  chi  square 

number  with  n  degrees  of  freedom. 

.  .  2 
In  Wilson  and  Hilferty  (1931) ,  values  of  x  are  cal¬ 
culated  for  n  =  1,  2,  3,  10,  30  at  p  =  0.80,  0.50,  0.20, 

0.05,  0.01  using  (4.2).  Comparison  with  the  corresponding 
2 

tabular  x  values  shows  that  the  approximation  is  quite 
good. 

Checks  on  the  Generation  of  U(0,1)  Random  Numbers 

We  did  just  a  few  tests  on  the  U(0,1)  random  number 
generator  we  have  chosen  since  it  was  quite  well  tested  in 
Lewis,  Goodman  and  Miller  (1969)  and  also  because  a  thorough 
test  requires  a  lot  of  computing  time.  Nevertheless,  our 
test  results  indicate  that  the  random  number  generator  is 
satisfactory . 

To  test  the  random  number  generator,  160,000  numbers 
were  generated.  These  numbers  were  divided  into  four  sub¬ 
samples,  each  of  size  40,000.  The  tests  were  then  performed 
for  each  subsample.  The  test  results  are  summarized  in 


Tables  4.1  and  4.2. 


Table  4.1.  Tests  on  the  Uniform  Random  Numbers 
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Table  4.2.  Serial  Correlation  at  Lags  1  to  6 


Trial\^Lag 

1 

2 

3 

4 

5 

6 

1 

.0040 

-.0064 

-.0056 

-.0082 

-.0045 

.0015 

2 

-.0065 

-.0053 

-.0036 

.0036 

.0035 

.0040 

3 

-.0025 

-.0043 

-.0056 

.0041 

.0003 

.0006 

4 

.0081 

-.0009 

-.0056 

-.0018 

.0079 

-.0090 

Expected 

value 

o 

• 

o 

o 

• 

o 

o 

. 

o 

o 

• 

o 

o 

• 

o 

o 

. 

o 

Table  4.1  shows  that  the  sample  moments  are  close'- to 

the  expected  values.  Two  goodness-of-fit  tests  were  per- 

2 

formed:  the  x  goodness-of-fit  test  and  the  Kolmogorov- 

2 

Smirnov  test.  For  the  x  test,  the  number  of  classes  into 
which  the  numbers  were  grouped  is  determined  by  the  Mann 
and  Wald  criterion  (Mann  and  Wald  (1942)).  For  N  =  40,000, 
the  number  of  classes  was  determined  to  be  261.  Only  one 
chi  square  statistic  computed  is  significant  at  a  =  .05. 

The  Kolmogorov-Smirnov  test  is  based  on  the  maximum  dif¬ 
ference  between  an  empirical  and  a  specified  hypothetical 
cumulative  distribution.  The  test  statistic  is 

*  i  i 

D  =  max  S. (x)  -  F. (x)  , 

n  l 

where  S^(x)  =  observed  cumulative  frequency  for  class  i, 

F^(x)  =  expected  cumulative  frequency  for  class  i. 

is  then  compared  with  the  critical  value  at  specified  N, 
the  total  number  of  observations.  From  Massey  (1951), 
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for  N  >  35,  the  critical  values  are 


1.36 


.05 


V 


N 


1.63 


.01 


nT 


N 


Thus,  in  our  case  N  =  40,000,  d  Q5  =  .0068,  d  Q  =  .0082. 

* 

None  of  the  computed  are  significant  at  .05  level. 

Hence,  we  conclude  that  the  generated  uniform  numbers  have  a 
uniform  distribution  over  the  (0,1)  interval. 

The  serial  correlation  coefficients  at  lags  1-6  in 
Table  4.2  approach  0.  Standardized  values  of  the  serial 
correlation  at  lag  one  which  are  approximately  N(0,1) 

(Anderson  (1942) )  were  calculated  and  none  are  significant 
at  .05  level.  The  runs  up  and  down  and  runs  above  and  below 
the  mean  test  were  also  performed  and  the  observed  number  of 
runs  agree  closely  with  the  expected  numbers. 

Thus  our  results  also  show  that  the  generator  by  Lewis , 
Goodman  and  Miller  (1969)  is  satisfactory. 


Checks  on  the  Generation  of  Chi  Square  Random  Numbers 


We  first  did  a  check  on  the  range  of  degrees  of  free¬ 
dom  in  which  the  second  method  of  chi  square  number  generation 
is  accurate.  Values  of  x  for  degrees  of  freedom  n  =  1,  3, 

5,  6,  7,  8,  12,  24,  30,  60  at  probability  p  =  .99,  .95,  .70 
.  20  ,  .  05  ,  .  01  ,  .  001  are  calculated  using  (4.2).  The  calcul¬ 
ated  values  are  compared  with  the  theoretical  values  from 
2 

the  x  tables.  It  is  found  that  the  approximation  becomes 
closer  and  closer  as  n  increases.  For  n  >  8,  the  maximum 
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relative  deviation  from  the  theoretical  value  is  2.976%  at 

p  =  .99  and  n  =  8.  All  the  other  deviations  are  less  than 

1%.  For  n  <  8,  there  are  some  larger  deviations  at  n  =  1, 

3,  5.  Hence,  the  second  method  should  only  be  used  for 

n  >  8  to  ensure  more  accuracy. 

Next  we  test  how  close  the  simulated  distributions 

using  both  methods  are  to  the  theoretical  chi  square  dis- 

2 

tribution.  The  x  goodness-of-f it  and  the  Kolmogorov- 

Smirnov  tests  were  performed.  The  sample  means  (x)  and 

2 

sample  variances  (S  )  were  also  calculated  and  compared  with 

2 

the  expected  means  (y)  and  variances  (a  ). 

Four  trials  each  consisting  of  5000  chi  square  num¬ 
bers  were  performed  for  each  selected  degree  of  freedom. 

The  class  width  was  arbitrarily  fixed  at  0.5  for  all  n.  In 
2 

the  x  goodness-of-f it  test  some  classes  were  grouped  to 

make  the  expected  frequencies  in  each  class  >  5.  Calculation 

2 

of  theoretical  x  probabilities  is  based  on  the  recurrence 
formula  in  Abramowitz  and  Stegun  (1964,  p.  941).  The  recur¬ 
rence  relation  is 

P(X2f  >  x)  =  p(x2f_2  >  X)  + 


1  -1 

.1  2  ,2  „ 

(2X  >  e 

rtf) 


-x  /2 


Some  of  the  results  are  contained  in  the  following  table 
and  are  representative  of  all  the  runs  performed. 


Table  4.3.  Tests  on  the  Chi  square  Random  Numbers 
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(b)  Method  2.  Table  4.3.  (continued) 
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The  sample  means  and  variances  are  close  to  the  ex- 
pected  values  y  =  n  and  a  =  2n.  On  the  average,  the  mean 
relative  deviation  for  the  two  methods  is  .0060  for  x  and 
.0187  for  S2. 

On  the  goodness-of- f i t  tests,  none  of  the  x  or 

Kolmogorov-Smirnov  statistic  calculated  in  Method  1  are 

significant.  The  critical  values  for  the  Kolmogorov-Smirnov 

test  are  d  =  .0192  and  d  A,  =  .0231.  In  Method  2,  only 

.05  .01 

a  few  are  significant  at  .05  level  and  none  at  .01  level. 

Hence  we  conclude  that  the  random  numbers  generated  using 

2 

these  two  methods  have  approximately  a  x  distribution  and 
that  the  first  method  gives  a  better  approximation  to  the 
expected  distribution  than  the  second  method.  We  should 
thus  use  Method  1  to  generate  chi  square  numbers  for  n  <  8 
and  use  Method  2  for  n  >  8. 

C.  Results  and  Discussion  on  the  Size  of  the  Test 

Results 

The  calculation  of  the  empirical  size  of  the  Behrens- 

Fisher  test  was  carried  out  as  described  in  Section  A. 

Tabular  values  d  to  be  verified  are  taken  from  Tables  VI 

P 

and  VI  in  Fisher  and  Yates  (1963).  Calculations  were  done 
for  degrees  of  freedom  n^,  n 2  ranging  from  3  to  24,  and  for 
0  from  15°  to  75°.  Parameters  (n^  n2 ,  0)  were  chosen  ac¬ 
cording  to  availability  of  the  corresponding  dp  value  in 
the  two  tables.  The  sample  size  and  tolerance  limit  e  were 
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5000  pairs  of  samples  and  e  =  0.10.  These  choices  are  dis¬ 
cussed  in  more  detail  later  in  this  section. 

As  described  in  Section  A,  the  actual  size  is  cal¬ 
culated  by  adding  up  the  number  of  observations  in  the  two 
critical  regions  and  dividing  the  sum  by  the  number  of  pairs 
of  samples  taken.  An  example  of  the  results  on  the  number 
of  actual  observations  in  each  region  is  given  in  Table  4.4. 
Two  trials  were  performed  for  each  set  of  (n^,  0)  and 

the  average  value  of  the  actual  size  was  taken.  These 
average  values  are  presented  in  Table  4.5.  In  the  table  tan0 
is,  as  defined  previously,  the  experimental  sample  variance 
ratio  S^/S^.  In  addition,  the  empirical  distribution  of 
d,  values  of  which  are  calculated  from  each  pair  of  samples, 
is  obtained  for  some  sets  of  (n^,  ,  0)  and  the  curves  of 

the  distribution  are  drawn.  Some  typical  results  are  shown 
in  Figures  4.1  -  4.6. 

Table  4.4.  The  Actual  Number  of  Observations  in  the  Three 

Regions  in  the  Case  ni  =  n2  =  ^  ®  =  45°. 


Nominal 
Size  a 

No.  of  Obs. 
in  the  left 
critical 
region 

No.  of  Obs. 
in  the 
acceptance 
region 

No.  of  Obs. 
in  the  right 
critical 
region 

Actual 

Size 

.10 

256 

4490 

254 

.1020 

.05 

118 

4763 

119 

.  0474 

.02 

42 

4911 

47 

.0178 

.01 

20 

4958 

22 

.  0084 

. .  . 

- 


. 
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Table  4.5.  Actual  Size  of  the  Behrens-Fisher  Test 


(a) 

ni  = 

“2 

=  3 

Nominal 

Size  a 

6 

.10 

.05 

.02 

.01 

15° 

.1010 

.0464 

.0172 

.  0090 

30° 

.1011 

.  0480 

.0189 

.0075 

45° 

.1020 

.  0474 

.0178 

.  0084 

(b) 

ni  = 

n2 

=  5 

Nominal 

Size  a 

e 

.  10 

.05 

*  .02 

.01 

15° 

.1010 

.  0480 

.0196 

.0104 

30° 

.1032 

,.0476 

.0194 

.  0098 

45° 

.  1004 

.  0524 

.  0186 

.  0096 

(c) 

ni = 

5 

n2  =  7 

Nominal 

Size  a 

0 

.10 

.05 

.02 

.01 

15° 

.  1070 

.0505 

.0238 

.0110 

30° 

.  0984 

.0510 

.0210 

.0092 

45° 

.1100 

.0525 

.0210 

.  0092 

0 

0 

KD 

.  1030 

.  0482 

.0172 

.  0084 

75° 

.  1104 

.0545 

.0213 

.  0111 

cont ' d 


Table  4.5  (cont'd.) 


(d)  n1  =  n2  =  6 


Nominal  Size  a 


(e)  n1 


(f) 


0 

.05 

.01 

15° 

.  0487 

.0088 

30° 

.  0486 

.  0096 

45° 

.  0480 

.0074 

=  8 

Nominal 

Size  a 

0 

.05 

.01 

15° 

.  0488 

.0110 

30° 

.  0498 

.  0118 

45° 

.  0498 

.  0099 

n2  =  12 

Nominal 

Size  a 

0 

.05 

.01 

15° 

.  0500 

.0103 

30° 

.0519 

.0092 

45° 

.  0508 

.0100 

60° 

% 

.0518 

.  0099 

75° 

.0499 

.  0099 

a 


Table  4,5  (cont'd.) 


(g)  ni  =  n2  =  12 

e 

15° 

30° 

45° 

(h)  =12  n2  =  24 

9 

15° 

30° 

45° 

60° 

75° 


Nominal  Size  a 


.05 

.01 

0509 

.  0092 

0504 

.  0090 

0504 

.  0095 

Nominal 

Size  a 

.05 

.01 

0493 

.0104 

0488 

.  0090 

0500 

.  0094 

0491 

.  0097 

0496 

.  0094 

(i)  n1  =  n2  =  24 

0 

15  ° 
30° 
45° 


Nominal 

Size  a 

.05 

.01 

0507 

.0102 

0501 

.0100 

0501 

.0104 

• 

To  summarize  the  results  in  Table  4.5  (a)  to  (i)  and 
to  see  more  clearly  how  close  the  actual  sizes  are  to  the 
respective  nominal  sizes,  we  calculated  the  deviations  of 
each  actual  size  from  the  nominal  size.  The  maximum  ab¬ 
solute  deviation  corresponding  to  each  n^ ,  n^  and  nominal 
size  are  then  obtained  and  entered  in  Table  4.6.  Thus  for 
nl  ~  n2  =  ^  r  a  =  .05  for  example,  from  Table  4.5  (a)  ,  the 
deviations  are  -.0036,  -.0020,  -.0026  at  9  =  15°,  30°,  45° 
respectively.  A  +  sign  indicates  a  positive  deviation 
(i.e.  actual  size  is  larger  than  the  nominal  size)  and  a 
-  sign  indicates  a  negative  deviation  (i.e.  actual  size 
is  less  than  the  nominal  size) .  The  maximum  absolute  value 
of  these  three  deviations  is  |  -  .0036  |  at  0  =  15°.  Hence 
we  ent€ir  -.  0036  in  the  .05  column  and  15°  in  the  last  column 
of  the  first  row  of  Table  4.6. 

The  table  is  further  divided  into  two  subtables. 

Table  4.6  (a)  contains  the  maximum  deviations  for  small 
odd  degrees  of  freedom  and  a  =  .10,  .05,  .02,  .01.  Table 
4.6  (b)  contains  maximum  deviations  for  even  degrees  of 

freedom  and  a  =  .05,  .01.  Table  4.7  contains  the  average 
deviations  for  each  degree  of  freedom.  Tables  4.6  and 


4.7  follow 
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(b)  Even  degrees  of  freedom. 

Maximum  Deviations  from 


Degrees 

of  Freedom 

Nominal  Size  a 

ni 

n2 

.05  .01 

6 

6 

6 

-.0020  -.0026 

o 

in 

8 

8 

-.0012 

15° 

8 

8 

+.0018 

30° 

6 

12 

+.0019  -.0008 

u> 

o 

o 

12 

12 

+.0009 

15° 

12 

12 

-.0010 

o 

O 

m 

12 

24 

-.0012  -.0010 

30° 

24 

24 

+.0007 

o 

in 

rH 

24 

24 

+.0004 

45° 

Table  4.7. 

Average  of  the  Deviation  for 
Various  Degrees  of  Freedom 

Average  Deviations 

from 

Degrees 

of  Freedom 

Nominal  Size  a 

ni 

n2 

.05 

.01 

3 

3 

.  0027 

.0017 

5 

5 

.  0023 

.  0003 

5 

7 

.  0021 

.  0011 

6 

6 

.0016 

.0014 

8 

8 

.  0005 

.0010 

6 

12 

.  0009 

.0003 

12 

12 

.  0006 

.  0008 

12 

24 

.  0006 

.  0006 

24 

24 

.  0003 

.0002 

*0'. 
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Discussion 


Choice  of  Sample  Size 

Trials  of  500,  1000,  2000,  5000  and  10,000  pairs 
of  samples  were  carried  out  for  a  few  sets  of  parameters 
(nl/  ,  6)  at  the  beginning  to  determine  the  best  sample 
size  to  use.  Sets  of  500,  1000,  and  2000  pairs  of  samples 
for  several  choices  of  n^,  0  were  tried  and  the  results 

were  not  stable.  Some  of  the  calculated  empirical  sizes 
deviated  (absolutely)  as  much  as  2%  from  the  nominal  size, 
while  others  were  very  close  to  it.  There  was  increased 
stability  of  results  when  samples  were  increased  to  5000 
and  10,000  pairs.  The  results  obtained  were  consistently 
close  to  the  nominal  size. 

To  decide  between  using  5000  and  10,000  pairs  of 
samples,  the  results  on  actual  size  and  the  execution  time 
using  these  two  sample  sizes  were  considered.  We  found  that: 

(a)  Stability  of  results  on  actual  size  for  both 
sample  sizes  are  similar. 

(b)  Using  10,000  pairs  takes  on  the  average  12 
minutes  on  each  run,  while  using  5000  pairs 
takes  about  5-6  minutes. 

After  these  considerations,  we  decided  that  5000 
pairs  of  samples  is  the  optimum  sample  size  to  take. 


Choice  of  Tolerance  Limit  s 

The  tolerance  limit  e  was  decided  by  trying  values 
of  e  from  .025  to  .10.  In  the  case  n-j^  =  n2  =  8  and  0  =  30°, 


8  and  0 
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for  1000  pairs  of  samples,  we  find  that  the  execution  time 
is  40  seconds  for  e  =  0.10,  1.2  minutes  for  e  =  0.05,  and  3 
minutes  for  £  =  .025.  This  increase  in  execution  time  with 
decreasing  s  is  expected  because  when  e  is  decreased,  the 
tolerance  limit  |  S±/S2  -  C  \  /  C  4  £  is  shortened.  Hence 
there  is  a  greater  chance  of  samples  with  the  S -^/s 2  rati° 
lying  outside  the  limit. 

A  counter  was  set  up  to  count  the  number  of  pairs 
of  samples  rejected  before  a  pair  of  samples  with  the  correct 
S1/S2  ratio  in  the  tolerance  limit  was  obtained.  The  average 
of  the  number  of  rejected  samples  in  all  the  samples  was  then 
taken.  The  average  found  was  172.6  for  e  =  .025,  49.94  for 
e  =  .05  and  24.08  for  e  =  .10. 

With  our  choice  of  sample  size  to  be  5000,  it  is 
too  time  consuming  to  use  e  =  .025  in  all  the  runs.  Values 
of  e  =  .05  and  .10  were  then  tried  using  5000  pairs  of  sam¬ 
ples.  The  average  execution  time  and  the  average  number  of 
rejected  samples  are  8.7  minutes  and  89.17  samples  respec¬ 
tively  for  e  =  .05.  For  £  =  .10  it  takes  5.5  minutes  or 
almost  half  the  time  and  55.28  rejected  samples.  Again,  we 
did  not  find  any  significant  difference  on  the  actual  sizes 
obtained.  With  5000  pairs  of  samples,  £  =  .10  seem  to  be 
the  most  reasonable  limit  to  use.  Hence,  in  all  subsequent 
trials,  a  sample  size  of  5000  pairs  and  a  tolerance  limit 
e  =  0.10  were  used. 

Use  of  the  Chi  Square  Approximation 


The  chi  square  approximation  or  method  2  of  chi 
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square  number  generation  as  discussed  in  Section  B  of  this 
chapter  is  more  accurate  for  larger  degrees  of  freedom. 

Hence  it  was  used  to  generate  the  chi  square  random  numbers 
only  for  larger  degrees  of  freedom  n1  =  n2  =  12,  =  12 

and  n.2  =  24,  and  n^  =  -  24.  For  other  degrees  of  free¬ 

dom,  Method  1  described  on  p.37  was  used.  The  execution 
time  using  the  chi  square  approximation  is  much  shortened. 

It  takes  about  2  minutes  to  run  5000  pairs  of  samples  using 
the  approximation  and  5.5  minutes  without,  both  with  the 
same  set  of  n1 ,  n 2  and  0. 

Range  of  n^,  0  and  the  d^  Values  Verified 

When  nn  =  n„ ,  the  tabular  d  value  at  0  =  15°  and 

12  p 

30°  are  the  same  as  the  d  value  at  0  =  75°  and  60°  respec- 

P 

tively.  Hence  only  the  values  at  0  =  15°,  30°,  and  45°  are 

verified.  When  n,  ^  n  .  d  at  0  =  60°,  75°  is  the  same  as 

the  dp  value  at  30°  and  15°  respectively,  with  n^  and  n^ 

interchanged.  For  0  =  0°  or  90° ,  the  d  statistic  has  the 

Student's  t  distribution  with  n ^  or  n^  degrees  of  freedom. 

Hence  the  critical  values  of  d  for  these  values  are  the  t 

values  with  the  corresponding  degrees  of  freedom  taken  from 

the  Student's  t  table.  Thus,  in  this  way  the  d  values  that 

P 

we  have  verified  have  already  covered  the  most  part  of  the 
two  tables  in  Fisher  and  Yates  (19  63)  ,  except  for  the  values 
corresponding  to  n^  or  ri2  <  3.  For  these  degrees  of  freedom, 
tests  done  to  check  the  z  approximation  we  used  show  that 
the  calculated  z  values  have  a  larger  relative  deviation 
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than  for  degrees  of  freedom  >  3.  There  is  a  large  relative 
deviation  of  2  7.5%  for  degrees  of  freedom  n^  :=  =  1  at 

the  tail  of  the  z  distribution.  For  degrees  of  freedom 
3,  the  largest  relative  deviation  is  2.182%,  while  the 


others  are  less  than  1 


Hence  in  order  to  obtain  the  cor- 


2  2 

rect  Zp  and  therefore  the  correct  /a^  for  each  p,  we 
decided  to  run  the  trials  starting  from  n^  ,  n^  >  3. 

The  available  percentage  points  of  d^  for  each 
chosen  set  (n^,  0)  are  verified  in  the  same  program. 

Thus  for  small  odd  degrees  of  freedom  the  .10,  .05,  .02  and 
.01  points  of  dp  are  verified  in  the  same  program,  and  for 
even  degrees  of  freedom,  the  .05  and  .01  points  are  verified, 


On  the  Four  Tables  Presented 

Table  4.4  shows  a  typical  result  on  the  actual 
number  of  observations  falling  in  the  critical  and  accep¬ 
tance  regions.  The  expected  numbers  in  each  region  for  the 
various  nominal  sizes  (a)  are: 


a  =  .10 

— 

250  , 

4500  , 

250 

a  =  .05 

— 

125, 

4750, 

125 

a  =  .02 

— 

50, 

4900  , 

50 

a  =  .01 

— 

25, 

4950  , 

25 

Comparing  the  expected  and  actual  values,  we  see 
that  they  are  close  to  each  other. 

The  results  on  the  actual  size  of  the  test  are 
presented  in  Table  4.5,  4.6  and  4.7.  They  can  be  summarized 


as  follows : 


'  l 


(1) 


The  actual  size  are  on  the  whole  quite  close 
to  the  nominal  size. 

(2)  There  is  an  almost  equal  number  of  positive 
and  negative  deviations  from  the  nominal 
size.  This  indicates  that  the  Behrens-Fisher 
test  does  not  yield  significant  levels  much 
lower  than  a,  contrary  to  the  claims  in 
other  papers  which  have  investigated  the 
test. 

(3)  Critical  values  at  small  odd  degrees  of  free¬ 
dom  on  the  average  yield  larger  deviations 
from  the  nominal  size  than  those  at  larger 
degrees  of  freedom  as  seen  from  Tables  4.6 
and  4.7. 

We  therefore  conclude  that  the  Behrens-Fisher 
test  gives  actual  sizes  which  are  close  to  a  and  there  is 
no  indication  that  they  are  much  lower  than  a.  Furthermore, 
the  performance  of  the  test  in  terms  of  actual  size  is 
better  for  larger  even  degrees  of  freedom. 

On  the  Distribution  of  the  d  Statistic 

The  distribution  of  the  d  statistic  was  studied 
empirically.  From  Figures  4.1  -  4.6,  it  is  seen  that  the 
distribution  is  symmetric  about  zero.  Also,  as  the  number 
of  degrees  of  freedom  increases,  the  distribution  becomes 
less  steep,  more  spread  out  and  approaches  a  bell-shaped 
curve.  We  expect  the  curve  to  approach  a  normal  curve  as 


n  increases  since  the  d  statistic  approaches  normal  as  n 
approaches  infinity. 

Results  on  Actual  Size  Using  the  Verification  Procedure  in 

Fisher  (1939)  . 


The  procedure  is  as  described  in  Chapter  3.  The 
method  of  calculating  z  is  the  same  one  used  in  the  verif¬ 
ication  based  on  Fisher  (1961) .  Various  values  of  N  and 
hence 

Pi  (pi  =  2N  /  i  =  1/  N) 

were  used  at  the  beginning  to  determine  the  best  N  to  use. 
The  results  are  presented  in  Table  4.8.  Results  on  actual 
size  using 

pi  =  ~~5T2~  '  1  =  1'  •••'  256,  • 

are  summarized  in  Table  4.9. 

Table  4.8.  Probabilities  That  the  Tabulated  Values 
_ of  d  are  Exceeded  at  Various  N _ 

(n^  =  n£  =  8,  0  =  75°,  a  =  nominal  size  ) 

Probabilities  of  Exceeding  d 
N  a/2  =  .025  a/2  =  .005 


16 

.02452 

.00474 

32 

.02465 

.00474 

64 

.02482 

.00486 

128 

.02491 

.00492 

256 

.02496 

. 00496 

512 

.02500 

.00500 

1000 

.02500 

.  00500 

Table  4.9 


Probabilities  that  the  Tabulated  Values  of  d 
are  Exceeded  (p^  =  ( 2 i  -  1)  /  512,  i  =  1,  2, 

256,  a  =  nominal  size)  for  Even  Degrees 

of  Freedom 


.025 

.005 

15° 

.0249 

.00495 

30° 

.  0249 

.00493 

45° 

.  0250 

.00492 

(b) 

nl  =  n2  =  8 

a/ 2 

.025 

.005 

15° 

.  0249 

.00496 

30° 

.  0249 

.00496 

45° 

.  0249 

.00495 

(c) 

n^  =  12,  = 

24 

5-^/2 

.025 

.005 

15° 

.  0249 

.00500 

30° 

.  0250 

.00499 

45° 

.  0249 

.00498 

60° 

.0249 

.00498 

75° 

.  0249 

.00498 

of  d 


Table  4.10. 


Probabilities  that  the 
are  Exceeded  (p^  =  ( 2 i 

256,  a  =  nominal  size) 

Freedom 


Tabulated  Values 
-  1)/512,  i  =  1, 

for  Odd  Degrees  of 


(a)  n1  =  n2  =  3 


^a/2 

.05 

.025 

i — 1 

o 

• 

.005 

15° 

.  0500 

.  0250 

.00999 

.00497 

30° 

.  0501 

.  0251 

.00999 

.00494 

45° 

.  0501 

.  0251 

.0100 

.00495 

(b)  n1  = 

3 ,  =  5 

.  05 

.025 

rH 

o 

• 

.005 

15° 

.  0500 

.  0250 

.0100 

.00498 

30° 

.  0499 

.  0250 

.  00996 

.00496 

45° 

.  0499 

.  0249 

.  00994 

.00493 

(T\ 

O 

o 

.  0499 

.  0249 

.00991 

.00490 

75° 

.0499 

.  0249 

.  0099 

.00488 

(c)  n±  = 

3,  n2  =  7 

_ a/2 

.05 

.025 

.01 

.005 

15° 

.  0499 

.  0249 

.  00998 

.00488 

30° 

.  0499 

.  0249 

. 00994 

.00493 

45° 

.  0499 

.  0249 

.  00996 

.00496 

o 

O 

.  0500 

.  0249 

.00991 

. 00498 

o 

in 

r- 

.  0500 

.  0250 

.  0100 

.00490 
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Discussion 

In  determining  the  best  N  to  use,  it  is  seen  from 
Table  4.8  that  as  N  increases,  the  actual  size  approaches 
the  nominal  size.  N  =  256  or  p^  =  (2i  -  1)/512  was  used 
because  it  gives  close  actual  sizes  and  a  further  increase 
in  N  does  not  improve  the  results  to  a  significant  extent. 
Moreover,  the  time  needed  for  execution  increases  rapidly 
as  N  becomes  large. 

From  the  results  shown  in  Table  4.9  -  4.10,  it  is 
again  seen  that  the  Behrens-Fisher  test  yields  actual  sizes 
close  to  the  nominal  size.  This  verification  using  the 
relationship  of  d  and  Student's  t  distribution  further  show 
that  the  Behrens-Fisher  test  rejects  the  null  hypothesis 
when  it  is  true  with  a  frequency  close  to  the  nominal  size 
a. 

A  Comparison  With  Other  Investigations  of  the  Behrens-Fisher 

Test. 

The  more  recent  investigations  on  the  test  are  as 
mentioned  in  Chapter  2,  Bennett  and  Hsu  (1961) ,  Mehta  and 
Srinivasan  (1970),  Wang  (1971).  In  addition,  Yates  (1964) 
also  clarified  some  points  on  fiducial  probability  and  the 
Behrens-Fisher  test. 

In  the  first  three  papers  above,  the  test  was  examined 
in  the  context  of  the  confidence  interval  approach,  namely, 
that  in  repeated  sampling  of  any  fixed  population,  the  size 
must  be  equal  to  the  frequency  with  which  the  hypothesis  is 


Y  v  sea  '  :i  re.  lt  s *  *io5  bsbs ..  r:  s  fit  i  %ievoe:xoM 
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rejected.  Hence,  the  actual  size  of  the  test  was  calculated 
for  populations  with  various  sets  of  arbitrarily  fixed 
al  /a2  •  The  actual  sizes  calculated  were  all  found  to  be 
much  less  than  the  nominal  size. 

However,  Fisher  arrived  at  his  solution  using  the 
fiducial  approach,  a  different  approach  from  the  confidence 
interval  approach.  In  the  fiducial  argument,  the  statement 
that  "in  repeated  sampling  of  any  fixed  population,  the 
level  of  significance  must  be  equal  to  the  frequency  with 
which  the  hypothesis  is  rejected" ,  is  not  a  requirement  for 
a  test  of  significance.  In  particular,  repeated  sampling 
from  fixed  populations  is  foreign  to  the  approach  (see 
Fisher  (1955) ) .  The  fiducial  solution  is  based  on  the  two 
suppositions  stated  at  the  end  of  Chapter  3  (p.26) ,  and  will 
yield  actual  sizes  close  to  a  when  the  true  variance  ratio 
is  distributed  in  the  z  distribution  and  not  fixed  during 
the  verification. 

It  should  thus  be  within  the  context  of  fiducial 
approach  that  we  investigate  the  test.  From  the  results 
that  we  have  obtained,  it  is  clear  that  a  proper  verific¬ 
ation  based  on  the  assumptions  of  the  test  yields  sizes 
close  to  a.  We  thus  conclude  that  the  Behrens-Fisher  test 
is  recommendable  for  practical  use  for  the  following  two 
reasons : 

(1)  The  test  rejects  the  hypothesis  when  it  is  in 

fact  true  with  a  frequency  close  to  the  nominal 


size  a. 
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(2)  It  is  convenient  to  use  because  tables  of 
critical  values  of  d  are  available  for  a 
relatively  wide  range  of  (n^ ,  9),  in¬ 

cluding  small  odd  degrees  of  freedom. 
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DISTRIBUTION  OF  D 
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Distribution  of  d  at  = 


Figure  4.1 


5  /  e 


45°  . 
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DISTRIBUTION  OF  D 


O 


D 


i 

Figure  4.2.  Distribution  of  d  at  n1  =  =  8,  0  =  15°. 
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Distribution  of  d  at  =  n2  - 


Figure  4.3. 


8,  6 


30°  . 
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DISTRIBUTION  OF  D 
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Figure  4.4.  Distribution  of  d  at  n1  =  =  8 ,  0  =  45°. 
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Distribution  of  d  at  n,  =  12 ,  n0  = 

i 


Figure  4 . 5 
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Distribution  of  d  at  =  24,  0  - 


Figure  4.6 
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D*  Results  and  Discussion  on  the  Power  of  the  Test. 

Introduction 

The  power  of  the  test  of  hypothesis  is  defined  as 
the  probability  that  a  null  hypothesis  will  be  rejected 
when  it  is  in  fact  false.  The  power  when  no  deviation 
exists  is  then  the  probability  that  a  true  null  hypothesis 
will  be  rejected,  or  in  other  words,  the  actual  size  of  the 
test.  In  choosing  a  test  statistic,  one  ideally  wants  the 
power  to  be  as  large  as  possible  when  the  null  hypothesis 
is  false. 

In  the  Behrens-Fisher  problem,  we  have  the  null 
hypothesis  Hq  :  =  ^2 '  and  the  alternative  hypotheses 

H1  :  ^1  ^  1J2'  ^1  >  ^2  or  ^1  <  ^2*  In  literature  we 

have  covered,  to  decide  the  merits  of  the  various  proposed 
tests  for  the  problem,  the  powers  and  sizes  of  the  various 
tests  were  calculated  and  compared  (e.g.  Mehta  and 
Srivivasan  (1970)  ,  Bennett  and  Hsu  (1961)).  It  was  found 
that  the  size  and  power  of  the  Behrens-Fisher  test  are  much 
lower  than  those  of  the  other  tests. 

In  the  previous  section,  we  have  examined  the  size 
of  the  test  taking  into  consideration  the  assumption  of  the 
fiducial  solution.  We  have  found  the  actual  size  to  be 
close  to  the  nominal  size.  We  expect  that  using  the  same 
procedure  the  power  performance  of  the  test  would  also  be 
good.  We  thus  use  the  same  procedure  to  examine  the  power 


of  the  test. 


■- 


Procedure 


The  procedure  to  calculate  the  power  of  the  test  is 
essentially  the  same  as  that  for  calculating  the  actual  size 
except  that  in  the  former  the  true  difference  in  the  popu¬ 
lation  means  is  nonzero.  The  same  samples  generated  to 
calculate  the  actual  size  are  thus  used  to  calculate  the 
power  of  the  test.  The  only  difference  is  in  the  numer¬ 
ator  of  the  d  statistic  which  is  now  normally  distributed 
with  a  nonzero  mean  equal  to  -y  2 • 

The  alternative  hypothesis  used  in  our  calculations 
is  :  y^  <  y 2 *  For  eac^  set  °f  parameters  (n^,  0) 

and  y^  =  0,  y2  is  allowed  to  vary  from  1  to  6.  The  power 
for  each  of  the  size  specifications  was  calculated  from 
the  same  sample  as  the  relative  frequency  that  the  observed 
d  values  exceed  the  respective  critical  values.  That  is, 

<  y 

The  empirical  distribution  of  d  under  the  alterna¬ 
tive  hypothesis  y2  -  y^  =  1,  y2  -  y^  =  5  was  also  ob¬ 
tained  and  their  plots  were  drawn  for  several  cases. 

Results 

Some  typical  results  on  the  power  of  the  test  for 
some  sets  of  parameters  (n^,  n2 ,  0)  are  tabulated  in  Table 
4.11.  Power  curves  for  some  results  are  drawn  and  presented 
in  Figures  4.7  -  4.10.  Curves  for  the  empirical  distrib¬ 
utions  of  d  are  presented  in  Figures  4.11  -  4.14. 


power  of  the  test  =  rel.  frequency  d 
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.11. 

Power  of 

the 

Behrens 

-Fisher 

Test  for 

Various 

6, 

where 

6  =  y 

2  "  W1 

and  the 

Nominal 

Size  =  a 

• 

n2 

=  3 

ois'> 

6  0.0 

1.0 

2.0 

3.0 

4.0 

5.0 

6.0 

.10 

.101 

.  121 

.  241 

.  385 

.  520 

.653 

757 

.05 

.  046 

.057 

.  128 

.234 

.  361 

.  476 

595 

.  02 

.017 

.018 

.048 

.100 

.174 

.265 

366 

i — 1 

o 

• 

.009 

.  010 

.022 

.047 

.088 

.144 

217 

a  "s» 

6  0.0 

1.0 

2.0 

3.0 

4.0 

5.0 

6.0 

.10 

.101 

.219 

.484 

.  720 

.  864 

.934 

970 

.  05 

.048 

.119 

.  330 

.  558 

.  745 

.  868 

934 

.02 

.  019 

.048 

.048 

.164 

.  345 

.529 

700 

i — 1 

o 

• 

.008 

.024 

.088 

.207 

.  361 

.  519 

664 

6  0.0 

1.0 

2.0 

3.0 

4.0 

5.0 

6.0 

.10 

.102 

.  310 

.686 

.  898 

.971 

.991 

997 

.05 

.047 

.188 

.514 

.  783 

.925 

.976 

992 

.02 

.018 

.086 

.  300 

.  570 

.781 

.915 

966 

.01 

.008 

.041 

.185 

.404 

.  617 

.790 

906 

0  =  15' 


0  =  30 


0  =  451 


( cont  rd. ) 
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(b)  n1  =  6,  n2  =  12 


.05  .050 

0  =  15° 

.01  .010 


.05  .052 

0  =  30° 

.01  .009 


.05  .051 

0  =  45° 

.01  .010 


\  6  0.0 

a  


.05  .052 

0  =  60° 

.01  .010 


^ _ ^  0.0 

.05  .050 

.01  .010 


Table 

4.11  ( cont ' d .  ; 

o 

• 

rH 

2.0 

3.0 

.111 

.292 

.  529 

.032 

.  116 

.285 

1.0 

2.0 

3.0 

.259 

.697 

.932 

.100 

.434 

.794 

o 

• 

rH 

2.0 

3.0 

.408 

.  885 

.992 

.181 

.697 

.952 

o 

• 

i — 1 

2.0 

3.0 

.  521 

.970 

.999 

.242 

.  826 

.990 

1.0 

2.0 

3.0 

.583 

.989 

1.0 

.261 

.  881 

.998 

4.0  5.0  6.0 

.730  .871  .942 

.487  .677  .821 

4.0  5.0  6.0 

.987  .999  1.0 

.949  .989  .999 

4.0  5.0  6.0 

1.0  1.0  1.0 

.996  1.0  1.0 


4.0  5.0  6.0 


o 

• 

rH 

1.0 

1.0 

1.0 

• 

o 

o 

• 

rH 

4.0 

o 

• 

in 

6.0 

o 

• 

i — 1 

o 

• 

rH 

o 

• 

rH 

1.0 

o 

• 

rH 

o 

• 

rH 

0 
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Discussion 

Table  4.11  and  Figures  4.7  -  4.10  may  be  summarized 
as  follows : 

(i)  For  a  given  set  (n^,  6)  the  power  increases 

for  increasing  test  size. 

(ii)  For  fixed  0  and  size,  the  power  increases  with 
increases  in  n^  or  n 

(iii)  For  fixed  n^,  n^  and  the  size  of  the  test,  the 
power  increases  with  increasing  sample  variance 
ratio . 

These  results  are  calculated  with  the  assumptions  of 
the  fiducial  solution.  A  direct  comparison  with  the  results 
in  Mehta  and  Srinivasan  (1970)  and  3ennett  and  Hsu  (1961) 
is  not  possible  since  they  used  the  confidence  interval 
approach  and  calculated  the  power  with  respect  to  n^ ,  n^ 
and  true  variance  ratio.  However,  since  the  actual  size 
of  the  test  we  calculated  is  close  to  the  nominal  size, 
the  power  is  also  raised  correspondingly.  We  thus  say  that 
the  power  performance  of  the  test  is  acceptable  for  the 
practical  user. 

Figures  4.11  -  4.14  show  that  when  the  null  hypothesis 
is  not  true,  i.e.  ~  >  distribution  of  d  becomes 

more  and  more  shifted  to  the  left  as  the  difference  between 
the  true  means  increases  and  also  as  0  increases. 
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o 
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Figure  4.7.  Typical  Empirical  Power  Curves  of  The  Behrens-Fisher 

Test  for  Fixed  Sample  Sizes,  0  and  Various  Nominal 

Test  Sizes. 
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Figure  4.8.  Typical  Empirical  Power  Curves  of  the  Behrens-Fisher  Test 

for  Fixed  Nominal  Size  and  9  and  Various  (n.  ,  n„) . 
Nominal  Size  =  .01.  0  =  30°. 
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U3M0d 


Figure  4.9.  Typical  Empirical  Power  Curves  of  the  Behrens-Fisher  Test 

for  Fixed  Nominal  Size  and  0  and  Various  (n,,  n2) • 
Nominal  Size  =  .05,  0  =  30°. 
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I'l^nre  4.13.  The  Empirical  Distribution  of  the  d  Statistic  Under  the  Null  Hypothesis 

M2  ~  yl  =  °'  and  the  Alternative  Hypothesis  ^2  ~  y  1  =  1 '  y 2  ”  y  1  =  5 

(n  =  8,  n0  =  8,  6  =  45°) 


■ 


DISTRIBUTION  0 


82 


\— 

02*0 


+ 


x  CD 
O 

C\J 

--  on 
x  ' 


O 

CD 


r- 

on 


oro  S0*0 
13N3fl03yj  H3U 


00  *0 


CD 


St  *0 


Figure  4.14.  The  Empirical  Distribution  of  the  d  Statistic  Under  the  Null  Hypothesis 

-  Ml  =  0,  and  the  Alternative  Hypothesis  p2  -  y ^  =  1 ,  u2  -  u1  =  5 

(n  =24  n  =  24,  6  =  45°) 


CHAPTER  V.  CONCLUSION 


We  have  made  a  survey  of  some  of  the  major  solutions 
proposed  for  the  Behrens-Fisher  problem.  The  Behrens-Fisher 
solution,  the  first  to  be  proposed,  has  been  criticized  for 
giving  an  actual  size  of  the  test  much  lower  than  the 
nominal  size.  We  have  examined  the  solution  and  we  showed 
in  our  sampling  study  based  on  the  assumptions  of  the  test 
that  it  actually  yields  an  empirical  size  close  to  the 
nominal  size.  This  conclusion  applies  to  both  small  and 
large  degrees  of  freedom  and  more  especially  to  larger 
degrees  of  freedom.  The  power  of  the  test  was  studied 
using  the  same  approach  and  observations  were  made  on  the 
power  performance  with  changes  in  n^ ,  n 2  and  0. 

We  thus  consider  that  the  objectives  of  this  study 
have  been  met.  The  Behrens-Fisher  test  is  recommended  for 
use  for  the  following  two  reasons.  Firstly,  the  test 
actually  yields  empirical  sizes  close  to  the  size  specified 
by  the  user.  Secondly,  tables  for  its  use  are  available 
for  a  wider  range  of  degrees  of  freedom  and  sample  variance 
ratio  than  some  of  the  other  solutions  proposed.  Based  on 
the  range  of  n^  and  n^  that  we  have  verified,  the  test  can 
be  used  for  the  whole  range  of  n2  covered  in  the  avail¬ 

able  tables  except  perhaps  for  n^  and  <  3  which  we  have 
not  verified.  Moreover,  the  test  is  most  suitable  for  use 
for  large  n1  and  n2 ,  say  n^  and  n2  >  6  since  on  the  average 
the  actual  sizes  at  these  degrees  of  freedom  have  a  smaller 
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deviation  from  the  nominal  size. 

On  the  method  of  calculating  the  actual  size,  our 
procedure  could  certainly  be  improved  by  say  narrowing  the 
tolerance  limit  e  in  obtaining  samples  with  the  correct 
S1/$2  ra^^°  an<3  increasing  further  the  number  of  pairs  of 
samples  taken.  This  would  be  expensive  to  do  but  would 
increase  the  accuracy  of  the  results. 

Calculation  and  tabulation  of  critical  values  of  d 
for  more  degrees  of  freedom  would  be  helpful  for  the  user 
of  the  test. 

On  the  other  hand,  the  general  appearance  of  the 
empirical  d  distribution  in  Figures  4.1  -  4.6  in  Chapter 
IV,  and  the  approximation  of  Patil  (1964)  suggests  the 
possibility  of  approximating  the  d  distribution  by  the  t 
distribution.  Hence  another  aspect  of  the  test  that  is 
worth  looking  into  is  the  possibility  of  using  the  t  tables 
which  are  more  readily  available  for  the  critical  values 
of  d.  One  could  investigate  the  approximation  at  various 
degrees  of  freedom  and  sample  variance  ratios.  If  the 
approximation  is  good,  then  the  Behrens-Fisher  test  will 
be  even  more  available  for  practical  use. 
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